Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.
translated by 谷歌翻译
Graphons are general and powerful models for generating graphs of varying size. In this paper, we propose to directly model graphons using neural networks, obtaining Implicit Graphon Neural Representation (IGNR). Existing work in modeling and reconstructing graphons often approximates a target graphon by a fixed resolution piece-wise constant representation. Our IGNR has the benefit that it can represent graphons up to arbitrary resolutions, and enables natural and efficient generation of arbitrary sized graphs with desired structure once the model is learned. Furthermore, we allow the input graph data to be unaligned and have different sizes by leveraging the Gromov-Wasserstein distance. We first demonstrate the effectiveness of our model by showing its superior performance on a graphon learning task. We then propose an extension of IGNR that can be incorporated into an auto-encoder framework, and demonstrate its good performance under a more general setting of graphon learning. We also show that our model is suitable for graph representation learning and graph generation.
translated by 谷歌翻译
KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.
translated by 谷歌翻译
State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.
translated by 谷歌翻译
Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.
translated by 谷歌翻译
Offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without the possibility of additional online data collection. This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment. However, this promise also bears the drawback of this setting. The restricted dataset induces subjective uncertainty because the agent can encounter unfamiliar sequences of states and actions that the training data did not cover. Moreover, inherent system stochasticity further increases uncertainty and aggravates the offline RL problem, preventing the agent from learning an optimal policy. To mitigate the destructive uncertainty effects, we need to balance the aspiration to take reward-maximizing actions with the incurred risk due to incorrect ones. In financial economics, modern portfolio theory (MPT) is a method that risk-averse investors can use to construct diversified portfolios that maximize their returns without unacceptable levels of risk. We integrate MPT into the agent's decision-making process to present a simple-yet-highly-effective risk-aware planning algorithm for offline RL. Our algorithm allows us to systematically account for the \emph{estimated quality} of specific actions and their \emph{estimated risk} due to the uncertainty. We show that our approach can be coupled with the Transformer architecture to yield a state-of-the-art planner for offline RL tasks, maximizing the return while significantly reducing the variance.
translated by 谷歌翻译
政策梯度方法被广泛用于学习控制政策。它们可以轻松地分配给多名工人,并在许多领域中达到最新结果。不幸的是,它们表现出很大的差异,随后遭受了高样本的复杂性,因为它们在整个轨迹上汇总了梯度。在另一个极端情况下,计划方法,例如树木搜索,使用考虑未来LookAhead的单步过渡来优化策略。这些方法主要用于基于价值的算法。基于计划的算法需要一个正向模型,并且在每个步骤上都是计算密集型的,但更有效。在这项工作中,我们介绍了SoftTreemax,这是将树搜索整合到策略梯度中的第一种方法。传统上,针对单个状态行动对计算梯度。取而代之的是,我们基于树的策略结构在每个环境步骤中利用树叶的所有梯度。这使我们能够将梯度的差异减少三个数量级,并与标准策略梯度相比,从更好的样本复杂性中受益。在Atari上,与分布式PPO相比,SoftTreemax在运行时的表现高达5倍。
translated by 谷歌翻译
培训低级的深层神经网络,即使用分解层,特别是社区感兴趣的:它在记忆消耗和训练时间方面提供了对未分离培训的效率。先前的工作集中在预训练的网络的低级近似值和低级空间中的培训中,并提供了其他目标,为所选实践提供了各种临时解释。我们分析了在实践中运作良好的技术,并通过对诸如GPT2之类的模型进行广泛的消融,我们提供了证据表明该领域的共同信念,这暗示着令人兴奋的研究机会仍然需要回答。
translated by 谷歌翻译
最近,Daniely和Granot [Arxiv:1910.05697]引入了一种新的复杂性概念,称为近似描述长度(ADL)。他们用它来得出神经网络的新概括范围,尽管大量工作,但仍无法实现更古典的技术,例如离散化,覆盖数量和ademacher的复杂性。在本文中,我们探讨了ADL与功能复杂性的经典概念(例如覆盖数字和VC维度)的关系。我们发现,对于其范围是真实的函数,ADL基本上等同于这些经典的复杂性度量。但是,这种等效性破坏了高维范围的功能。
translated by 谷歌翻译
我们建议第一个通过对弱的微型计算机进行深入学习的实时语义细分的系统,例如Raspberry Pi Zero Zero V2(其价格\ 15美元)附加到玩具无人机上。特别是,由于Raspberry Pi的重量不到$ 16 $,并且其大小是信用卡的一半,因此我们可以轻松地将其连接到普通的商业DJI Tello玩具器中(<\ $ 100,<90克,98 $ \ \时间$ 92.5 $ \ times $ 41毫米)。结果是可以从板载单眼RGB摄像头(无GPS或LIDAR传感器)实时检测和分类对象的自动无人机(无笔记本电脑或人类)。伴侣视频展示了这款Tello无人机如何扫描实验室的人(例如使用消防员或安全部队)以及在实验室外的空停车位。现有的深度学习解决方案要么在这种物联网设备上实时计算要么太慢,要么提供不切实际的质量结果。我们的主要挑战是设计一个系统,该系统在网络,深度学习平台/框架,压缩技术和压缩比的众多组合中占有最好的选择。为此,我们提供了一种有效的搜索算法,旨在找到最佳组合,从而导致网络运行时间与其准确性/性能之间的最佳权衡。
translated by 谷歌翻译